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Robust Hypothesis Testing with a-Divergence 

Gokhan Gill, Student Member, IEEE, Abdelhak M. Zoubir, Eellow, IEEE, 


Abstract —A robust minimax test for two composite hypotheses, 
which are determined by the neighborhoods of two nominal dis¬ 
tributions with respect to a set of distances - called a— divergence 
distances, is proposed. Sion’s minimax theorem is adopted to 
characterize the saddle value condition. Least favorable distri¬ 
butions, the robust decision rule and the robust likelihood ratio 
test are derived. If the nominal probability distributions satisfy 
a symmetry condition, the design procedure is shown to be 
simplified considerably. The parameters controlling the degree of 
robustness are bounded from above and the bounds are shown to 
be resulting from a solution of a set of equations. The simulations 
performed evaluate and exemplify the theoretical derivations. 

Index Terms —Detection, hypothesis testing, robustness, least 
favorable distributions, minimax optimization, likelihood ratio 
test. 


I. Introduction 

Decision theory has been an active field of research ben¬ 
efiting from contributions from several disciplines, such as 
economics, engineering, mathematics, or statistics. A decision 
maker (or a detector) chooses a course of action from several 
possibilities. A detector is said to be optimal or to be giving 
the best decision for a particular problem if the decision rule of 
interest minimizes (or maximizes) a well dehned cost function, 
e.g., the error probability (or the probability of detection) IT]. 
In addition to the fact that decision theory is truly an in¬ 
terdisciplinary subject of research, there are many areas of 
engineering, where decision theory finds applications, e.g., 
radar, sonar, seismology, communications and biomedicine. 
For some applications, such as image and speech classih- 
cation or pattern recognition, interest is in a statistical test 
that performs well on average. However, for safety oriented 
applications such as seismology or forest hre detection, as well 
as for biomedical applications such as early cancer detection 
from magnetic resonance images or X-ray images, interest 
is in maximizing the worst case performance because the 
consequences of an incorrect decision can be severe |T|. 

In general, any practical application of decision theory can 
be formulated as a hypothesis testing problem. For binary 
hypothesis testing, it is assumed that under each hypothesis 
Hi, the received data y = (yi,-..,?/«) G H follows a par¬ 
ticular distribution F) corresponding to a density function fi, 
i G {0,1}. A decision rule S partitions the whole observation 
space Q into non-overlapping regions corresponding to each 
hypothesis. The optimality of the decision rule S depends on 
the correctness of the assumption that the data y follows F). 
However, in many practical applications either Fq and/or Fi 
are partially known or are affected by some secondary physical 
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effects that go unmodeled El- 

Imprecise knowledge of Fq or Fi leads, in general, to per¬ 
formance degradation and a useful approach is to extend the 
known model by accepting a set of distributions F^, under each 
hypothesis Hi, that are populated by probability distributions 
Gi, which are at the neighborhood of the nominal distribution 
Fi based on some distance E) H]. Under some mild conditions 
on F, it can be shown that the best (error minimizing) decision 
rule S for the worst case (error maximizing) pair of probability 
distributions (GojGi) G Fq x Fi accepts a saddle value. 
Therefore, such a test design guarantees a certain level of 
detection at all times. This type of optimization is known 
as minimax optimization and the corresponding worst case 
distributions (Go,Gi) are called least favorable distributions 
(LFD)s a. 

The literature in this held is unfortunately not rich. One of 
the earliest and probably the most crucial work goes back 
to Huber, who proposed a robust version of the probability 
ratio test for the e—contamination and total variation classes 
of distributions a. He proved the existence of least favorable 
distributions and showed that the corresponding robust test 
was a censored version of the nominal likelihood ratio for 
both uncertainty classes. In a later work, Huber and Strassen 
extended the e—contamination neighborhood to a larger class, 
which includes hve different distances as special cases |5l. 
It was also shown that the robust test resulting from this 
new neighborhood was still a censored likelihood ratio test. 
Although it was found to be less engineering oriented by Levy 
m , the largest classes for which similar conclusions have been 
made was for the 2—alternating capacities proposed by Huber 
and Strassen 0. 

Another approach for robust hypothesis testing was proposed 
by Dabak and Johnson based on the fact that the choice 
of measures defining the contamination neighborhoods was 
arbitrary ||71. They chose the relative entropy (KL-divergence) 
because it is a natural distance between probability measures 
and therefore a natural way to define the contamination 
neighborhoods. Somewhat surprisingly, the robust test which 
minimizes the KL-divergence between the LFDs obtained 
from the closed balls with respect to the relative entropy 
distance was not a clipped likelihood ratio test, but a nominal 
likelihood ratio test with a modified threshold. It was noted 
that their approach was not robust for all sample sizes but 
when Kullback’s theorem is valid, that is for a large number 
of observations Q. The difference in the robust tests for 
e—contamination and relative entropy neighborhoods lies in 
the fact that all the densities in the class of distributions based 
on relative entropy are absolutely continuous with respect 
to the nominal distributions, but not for the case of the 
e—contamination class. 

A question left open by Dabak and Johnson was the design of 
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a robust test for a finite number of samples. Levy answered 
this question under two assumptions; monotone increasing 
nominal likelihood ratio and symmetric nominal density func¬ 
tions (/o(y) = fii—y)), where y S M. He implied that 
a robust test based on the relative entropy would be more 
suitable for modeling errors rather than outliers, due to the 
smoothness property (absolute continuity). He also showed 
that the resulting robust test was neither equivalent to Huber’s 
nor to Dabak’s robust test; it was a completely different test 

El. 

Although KL-divergence is a smooth and a natural distance be¬ 
tween probability measures, it is not clear why KL divergence 
should be considered to build uncertainty sets, especially since 
there are many other divergences, which are also smooth and 
have nice theoretical properties, e.g. the symmetry property, 
which KL-divergence does not have. Besides, theoretically 
nice properties do not always lead to preferable engineering 
applications, see for example m p.7]. In this respect, KL- 
divergence can be replaced by the a—divergence because 
a—divergence includes uncountably many distances as special 
cases, e.g. distance for a = 2 13, it reduces to the KL- 
divergence as a —> 1 and shares similar theoretical proper¬ 
ties with the KL-divergence such as smoothness, convexity 
or satisfiability of (generalized) Pythagorean inequality Go). 
Moreover, the flexibility provided by the choice of a results in 
performance improvements in various signal processing appli¬ 
cations and implies the sub-optimality of the KL-divergence. 
For example, in the design of distributed detection networks 
with power constraints, a—divergence is considered as the dis¬ 
tance between the probability measures, and error exponents 
of both kinds are maximized over all a S (0,1) ifTTl . In non¬ 
negative matrix factorization ina, and indexing and retrieval 
II3, the optimal value of a (with respect to some objective 
function) is found to be 1/2 corresponding to the squared 
Hellinger distance. In medical applications; e.g. in medical 
image segmentation lfT4l . restoration lITSi and registration M, 
the a—divergence is considered and the optimal value of a is 
found to be a non-standard value, i.e. a value which does not 
correspond to any known distance. There are also theoretical 
works which take advantage of the a—divergence in the 
design of statistical tests. It is reported for parametric models 
ifTTl . flSl as well as for non parametric models lfT3 that 
the use of a—divergence as the distance between probability 
measures, again with some non-standard values of a, e.g. 
a = 1.6 in ifTsl and a = 1.3 or a = 1.5 in lfT3 . leads 
to promising results. However, non of these aforementioned 
works have the property of minimax robustness. Furthermore, 
in non of the aforementioned works, it is possible to adjust 
the tradeoff between robustness and detection performance. 
Additionally, the parametric models have a possibly invalid 
assumption that the actual probability distributions can be 
represented by a parametric model. This motivates the work in 
this paper; a minimax robust design of hypothesis testing with 
the a—divergence distance, where the robustness is adjustable 
with respect to the detection performance by the choice of two 
robustness parameters, eg ci. 

The related literature can be summarized as follows; In 0, 
the symmetry constraint that was imposed in 12 was removed. 


considering the squared Hellinger distance. In Il2()l . the number 
of non-linear equations that needs to be solved to be able 
to design the robust test was reduced and a formula from 
where the maximum robustness parameters could be obtained 
was derived. In ED, robust approaches were extended to 
distributed detection problems where communication from the 
sensors to the fusion center is constrained. In a recent work 
E3, based on the KL-divergence, the monotone increasing 
likelihood ratio constraint was removed. 

In this paper, A minimax robust test for two composite 
hypotheses, which are formed by the neighborhoods of two 
nominal distributions with respect to the a—divergence, is 
designed. It is shown that for any a, the corresponding robust 
test is the same and unique. There is no constraint on the 
choice of nominal distributions. Therefore, our design general¬ 
izes 0. Since the a—divergence includes the KL-divergence 
or the squared Hellinger distance as a special case, cf. 13, 
our work also generalizes the works in 0, E3 and Ea. 
The advantage of considering the a—divergence for modeling 
errors is that it allows the designer to choose a single parameter 
that accounts for the distance without carrying out tedious 
steps of derivations for the design of a robust test. Additionally, 
the a priori probabilities in our work are not required to be 
equal, which was assumed in all previous works on model 
mismatch. An example is cognitive radio where the primary 
user may be idle for most of the time, i.e. P{'Ho) ^ P{T~Li) 
E3. Last but not least, the work in this paper allows vector 
valued observations. 

The organization of this paper is as follows. In the following 
section, some background to the minimax optimization prob¬ 
lem is given and characterization the saddle value condition is 
detailed, before the problem definition is stated. Section III is 


divided into three parts. In the first part, the minimax optimiza¬ 
tion problem is solved and the least favorable distributions, 
the robust decision rule as well as the robust likelihood ratio, 
which are later shown to be determined via solving two non¬ 
linear equations, are obtained. The second part shows how 
the problem is simplified if the nominal probability density 
functions satisfy the symmetry condition. In the third part, 
the maximum of the robustness parameters, above which 
a minimax robust test cannot be designed, are derived. In 
Section IV simulation results that illustrate the validity of 
the theoretical derivations are detailed. Finally, the paper is 
concluded in Section IV] 


H. Problem Formulation 

A. Background 

Let be a measurable space with the probability 

measures Fg, Fi, Gg, Gi, and G on it, having the density 
functions fg, fi, gg, gi and g respectively, with respect to 
some dominating measure /i, i.e., Fi,Gi,G g,, i G {0,1}. 
It is assumed that the nominal measures are distinct, i.e. the 
condition Fg = Fi p—almost everywhere is not true. Consider 
the binary composite hypothesis testing problem 

njc . /-I _ /-I 

riQ . ijr — LtO 

ni:G = Gi ( 1 ) 
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where the measures Gi are defined whenever their correspond¬ 
ing density functions gi belong to the closed ball 

Gt = {gi ■ D{gi, fi) < ej i G {0,1}, (2) 

where D is a distance between the density functions. In other 
words, every density function gi which is at least close to the 
nominal density fi is a member of the uncertainty class Qi and 
defines Gi, i G {0,1}. We choose D to be the a—divergence 
i.e., 

D{g,f-a) := , fl- [ a G M\{0,1} 

a(l-a) \ Jn J 

(3) 

since it is a convex distance for every a and it includes various 
distances as special cases ill p.l536]|] Given that y G Gl has 
been observed, a randomized decision rule 5 : i—[0,1] 

maps each y to a real number in the unit interval. Let A be 
the set of all decision rules (functions). Then, for any possible 
choice of 5 G A, the following error types are well defined: 
first, the false alarm probability 

Pf{SJo)= f Sfody, (4) 

Jn 

second, the miss detection probability 

PM{S,fi)= f (5) 

Jn 

and third, the overall error probability 

Pe{S, /o, ft) = P{Ho)PFiS, /o) + P{Hi)PMiS, /i). (6) 

It is well known that Pe is minimized if the decision rule is 
chosen to be the likelihood ratio test 

fo, Ky) < p 

Siy)=lK{y), l{y) = p, (7) 

[l, l{y)>p 

where p = PIJ-Lq)/P{'H i) is some threshold, l{y) := 
fi/fo{y) is the likelihood ratio at observation y and «::!!—>■ 
[ 0 , 1 ]. 

B. Saddle value specification 

In this section, the existence of a saddle value condition 
due to the functional topology of the minimax optimization 
problem is shown. Minimax theorem, which is attributed to 
John von Neumann, gives the necessary conditions such that 
the existence of a saddle value is guaranteed ll24ll . However, it 
is applicable if and only if both sets over which the maximiza¬ 
tion and minimization is performed are compact. Note that the 
closed balls {Qq and Qi) with respect to the a—divergence 
distance are not compact, therefore Von Neumann’s minimax 
theorem is not applicable in our case. Here, we adopt Sion’s 
minimax theorem ||25]| . 

sup min Pe{S, go, gi) 

(so,9i)e6ox6i 

= min sup PE{6,go,gi), (8) 

{9o^9i)^GoxGi 

^Notice that a—divergence is preferred against the Renyi’s a—divergence 
because Renyi’s a—divergence is convex only for a E [0, i](2l p.1540] 


which removes the compactness constraint on the set over 
which maximization is performed. In order for ([^ to be valid 
the following conditions must hold: 

• The objective function Pe{6, •) is real valued, upper semi- 
continuous and quasi-concave on Qo x Gi for all J G A 

• The objective function Pe{-, {go, gi)) is lower semi- 
continuous and quasi-convex on A for all {go,gi) G 

Go X Gi 

• A is a compact convex subset of a linear topological 
space 

• Go X Qi is a convex subset of a linear topological space 

The first two conditions hold true because Pe is a real valued 
continuous function, and linear on all three terms S,go,gi, 
therefore both convex and concave. The last condition is also 
true because, all convex combinations of gf G Gi and gj G Gi 
are in Qi since D is a convex distance and the Cartesian 
product of convex sets is again a convex set. Similarly, A is 
a convex set because for any t G [0,1] and for all (5o, i^i G A, 
tSo + (1 — t)5i G A. Note that any continuous function is also 
upper or lower semi-continuous and any convex function is 
also quasi-convex. Lastly, A, which is equivalent to [0,1]^ 
in infinite dimensional vector space, is the product of un- 
countably many compact sets [0,1]. According to Tychonoff’s 
theorem, A is compact with respect to the product topology 
Ei, Ell. Note that any finitely supported discretization of 
go and gi makes both Qo x Qi and A compact with respect 
to the standard topology. This is a straightforward result of 
Heine-Borel theorem Theorem 2.41]. 

Accordingly, based on Sions’s minimax theorem, there exists 
a saddle value for the objective function Pe, i.e., 

^£(^, 50 , 51 ) > PE{S,go,gi) > PE{S,go,gi)- ( 9 ) 

Since Pe is distinct in go and gi, we also have 

PF{5,go) < PF{S,go) 
PM{S,gi)<PM{S,gi). (10) 

C. Problem definition 

Based on ([Toll, the minimax optimization problem (jU can 
be solved considering the Karush-Kuhn-Tucker (KKT) multi¬ 
pliers. Hence, the problem formulation can be restated as 

Maximization: po = arg sup Pf(i 5, po) 

s.t. go > 0, P{go) = / go dp = 1 
9i = arg sup Pm{^,9i) 

SlGpi 

S.t. gi > 0, T(gi) = [ gi dp = 1 
Jr 

Minimization: S = aigmin Pe{S, go ^gi). (11) 

A 

In ( fTTj ), there are two separate maximization problems, which 
are coupled with the minimization problem through the deci- 
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sion rule 43 

III. Robust detection with a-DivEROENCE 

The following theorem provides a solution for (HD, which is 
composed of the least favorable densities and gi, the robust 
decision rule 5, the robust likelihood ratio function I = gi/go 
in parametric forms, as well as two non-linear equations from 
which the parameters can be obtained. Before the statement 
of the theorem, let k and be two real numbers with 0 < 
h < 1 < lu < oo. Furthermore, let 


implying the robust likelihood ratio function 

I < pk 



pk <l < plu (18) 

I > ply^ 

provide a unique solution to ([ID- Furthermore, the parameters 
li and lu can be determined by solving 


k{ll ylu) - 


- h)fo<ip 

- O/od/r’ 


z{li,lu;a,p) = / fidp, + k{li,lu) / /id^+ 


il2 


Jh 

k{li , l^ 


'I 3 


\a —1 (j^a—1 


ir"-{kiii,Qiu) 


a — 1 


+ mijuk-^ -m/p) 


a-l 


where 


12 

I 3 


= {y 
= {y 
= {y 


l{y) < ph} = {y ■■ l{y) < p} 
pk < l{y) < pL} = {y : Ky) = p} 
i{y) > pin} = {y : Ky) > p} 


and 


^i{l,liju;a,p) = 
1 


z{liJu',a,p) 


k{h,m-Hir" -im) 


im - mi,iu)iu)^-^ 

with $0 = ^ilp~^- 


ik{k,iu)‘^-^-m/p) 


a—l 


90 = 


9i = 


and the robust decision rule 

' 0 , 


(5 = 







z{lulu]a,p) 


( 12 ) 


fodp+ / %iliJu',a,p)°‘fodp 
J 12 

J fodp^ = x{a,eo) (19) 


and 


z{li,lu]a,p) 
fidp, + k{li,lu) 


(13) 



fidp - 


>l2 




fidp. = x{a,ei) 


( 20 ) 


(14) 


The least favorable densities 


z{li ,lu ;a,p) ’ 

1 < pk 


^ 0(^5 

pk <l < plu , 

(15) 

fe(/!.L)L f 
z{li ;CK,p) ’ 

1 K> plu 


z(li,lu;a,p) 

1 < pk 


^i(l,k,lu;a,p)fi, 

pk <l < plu , 

(16) 

z{li,lu ;cK,p) ^’ 

1 > plu 





1 , 


I < pk 

pk ^ I ^2 pi 

I > plu 


^In general argsup may not always be achieved since Qq and Qi are non- 
compact sets in the topologies induced by the a-divergence distance. In this 


where a, p) = z{k,lu\OL, p)^j, and x{a,e) = 1 — 

a(l — a)e. 

A proof of Theorem |III.1| is given in three stages. In the 
maximization stage, the Karush-Kuhn-Tucker (KKT) multipli¬ 
ers are used to determine the parametric forms of the LFDs, 
go and gi, and the robust likelihood ratio function 1. In the 
minimization stage, the LFDs and the robust decision rule 
5 are made explicit. Finally, in the optimization stage, four 
parameters that are needed to design the test are reduced to 
two parameters without loss of generality. 

Proof: 

A. Derivation of LFDs and the robust decision rule 

1) Maximization step: Consider the Lagrangian function 

L{9o, Ao, Po) = Pf{5, po)+Ao(eo-£’(5o, /o; a))+Fo(l-T(po))), 

( 21 ) 

where po and Aq > 0 are the KKT multipliers. It can be seen 
that L is a strictly concave functional of go, as d^L/dg^ < 0 
for every Aq > 0. Therefore, there exists a unique solution 
to (jZD, in case all KKT conditions are met ll29l Chapter 5]. 
More explicitly the Lagrangian can be stated as 

HgoAo,Po) = [ Sgo-pogo + —p ^—r ( (1 - a)/o 

Jw a(l - a) V 


+ cngo — foj + Aoco + Mod/i. (22) 

Note that similar to m, the positivity constraint po > 0 (or 
gi > 0) is not imposed, because for some a, this constraint is 
satisfied automatically, while for others each solution of La¬ 
grangian optimization must be checked for positivity. To find 
the maximum of ( |2^ , the directional (Gateaux’s) derivative 
(12) of the Lagrangian L with respect to go in the direction of a 
function ^ is taken; 

a — l 


paper, existence of and gi is due to the KKT solution of the minimax 
optimization problem, which is introduced in Section IIIII 


S — flQ -\- 



^ - 1 


'i/jdfi. (23) 
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Since ip is arbitrary, L is maximized whenever 

a —1 \ 


^ ~ Mo + 


^0 

1 — a 



(24) 


Solving ( |24| the density function of the LFD Go, 

'1-a 


90 = 


Ao 


(mo - > 5 ) + 1 /o 


(25) 


is obtained. Writing the Lagrangian for Pm, in a similar way, 
with the KKT multipliers po := Mi ^nd Aq := Ai it follows 
that 


9i = 


1 — a 

Ai 


(Mi — 1 + <5) + 1) fi- 


(26) 


Accordingly, the robust likelihood ratio function can be ob¬ 
tained as 


f=^ = 

go 


2 ^(/ri-l + ^) + l 

^ (Mo - ^) + 1 


(27) 


i < p 

1 

[ca/i, 

i < p 


i = p, 

51 = 

4>i/i, 

t = p, 

(29) 

i > p 

1 

[c4/l, 

i > p 



where 


T(pi) = 1, i G {0,1} are imposed. This leads to four non¬ 
linear equations: 


/odM = 1 , 



[ /o^M + 1 

i<p J 

4’g/cdM + C2 / 

i=P Ji: 


^ /idM + 1 

i<P J 

$i/idM + C4 / 

f=p Ji. 

c? J 

( /o^M + 1 
i<p J 

f 4>“/odM + c“ j 
i^p J 

J 

f /idM + J 
i<p J 

f $?/id/x + c? 1 
i^p J 


/idM = 1 , 


l>p 


l>p 


/odM = x{a,(^o), 


(32) 


in four parameters, where x{a, e) = 1 — a(l — a)e. 

3) Optimization Step: In this section, the number of equa¬ 
tions as well as the number of parameters are reduced. This 
allows the re-definition of I, 6, go and gi in a more compact 
form. Let li = ci/ca and /„ = C 2 /C 4 , then I = gi/go from 
( |29| ) indicates the equivalence of integration domains, Ii, I 2 
and I 3 as defined by ([T^. Applying the following steps in 


2) Minimization step: The minimizing decision function is 
known to be of type 0 with I to be replaced by I and k to be 
determined from © via solving I — p for S := 6 . For every 
p, this results in 

fo, i < p 

« - S (_l+Q,)(Ao+Ai(i/p)i-“) (-l+a)(Ao+Ai(i/p)i-“) ’ { “ 

U, i>p 

(28) 

Inserting ( |28] l in ( |25] l and ( |2^ , the least favorable density 
functions can be obtained as 


• Consider new domains Ii, I 2 , I 3 

• Use the substitutions ci := coh and C 2 := c^lu 

• Divide both sides of the first two equations by C 3 

• Equate the resulting equations to each other via I/C 3 

leads to C 4 = k{li,lu)c 3 , where k{li,lu) is as defined by 
©■ Next, the goal is to find a functional / s.t. 4>i = 
C 3 f(l,k,lu,a)- Since $o/oM = 4 * 1 / 1 , it follows that $0 = 
C 3 f{l,li,lu,o;)lp~^, therefore it suffices to evaluate only $ 4 . 
A step by step derivation of the functional / is given in 
Appendix Accordingly, $0 is also fully specified in terms 
of the desired parameters and functions. Inserting $1 (which is 
now a functional of C3, a), c.f., ( [ST] ), into the second 

equation in ( [32l i and noticing that C 4 = k{li,lu)c 3 leads to 
C 3 = l/z{li,lu',a,p), where p) is as defined by 

©. Applying a similar procedure, which can be found in 
Appendix ^ to S, c.f., ( [28] ), for the case I = o le ads to the 



robust decision rule S as given by Theorem III.l The least 
favorable densities, go and gi, and the robust likelihood ratio 
function I are obtained similarly, by exploiting the connection 
-between the parameters ci, C2, C3, C4 and li, lu- The same 
simplifications eventually let the four equations given by ( |3^ 
to be rewritten as the two equations stated by Theorem |III. 1 1 
As it was mentioned earlier, both go and gi are obtained 
uniquely from the Lagrangian L. Hence, I = gi/go, and as 
a result, S are also unique. It follows that the solution found 
for (|TT| by the KKT multipliers approach is unique as claimed. 


$0 — 

4*1 = 


/~1 + Ao-|-Ai-I-^o + Mi~Q^(~1 + Mo + Mi) 
\ Ao + Ai(Z/p)i““ 

/ — 1 -f Aq -f Ai -f /xg -f Ml ~ + Mo + Ml) 

\ Ai-I-Ao(Z/p)““i 


1 

ct-1 


(30) 



(31) 


In order to determine the unknown parameters, the constraints 
in the Lagrangian definition, i.e., D{gi, fi,a) = Ci and 


Theorem |III. 1| can be summarized as illustrated in Figure 
In other words, for any choice of pair of nominal density 
functions /g and /i, the robustness parameters eg and ei, the 
Bayesian threshold p and the distance parameter a, the robust 
design outputs the least favorable density functions go and gi 
and the robust decision rule 6. Notice that go and gi are the 
scaled versions (with different scaling factors) of the nominal 
distributions on I < pli and I > pl^, and in between, they 
are a composition of both nominals, since $g and $1 are both 
functionals of /g and /i . Interpretation of the decision rule 6 is 
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/o, f\ 
eo, ei 
P, a- 


Minimax Robust 

>■ 

Test Design 


*6 

*go,g\ 


Fig. 1. Summary of the robust hypothesis testing scheme given by 
Theorem liim 


/V 

/ 



Fig. 2. Nonlinearity relating the nominal likelihood ratios to the robust 
likelihood ratios. 


similar, i.e. in the same two regions the robust decision rule is 
almost surely zero or one, and in between it is a randomized 
decision rule. The robust version of the nominal likelihood 
ratio test is a non-linearly transformed version of the nominal 
likelihood ratios as illustrated by Figure It is somewhat 
surprising that the resulting robust likelihood ratio test is the 
same for the whole family of distances that are parameterized 
by a. In other words, the robust version of the likelihood ratio 
test, which is given by ( fTS) ) is not explicitly a function of a. 
Theorem III.l is a generalization of ll22l in the sense that 
as Of —>■ 1 and p = 1, the least favorable densities po and 
gi as well as the robust decision rule 6 reduce to the ones 
found in m- The flexibility afforded by the generality of 
considering a set of distances, called the a—divergence, over 
II 22 I is twofold. First, the designer does not need to search for 
a suitable distance for modeling errors, and each time test for 
the applicability to the engineering problem at hand, following 
tedious steps of derivations. Instead, only the parameter a is 
required to be determined, which can be done over a training 
data set via using a suitable search algorithm. Second, the a 
priori probabilities are not necessarily to be chosen equal. The 
proposed design with the a—divergence covers both cases, in 
addition to the fact that the choice of the nominal probability 
distributions also does not require any assumption. Additional 
constraints on the choice of nominal distributions as well as 
on the robustness parameters simplify the design as introduced 
in the next section. 


B. Simplified model with additional constraints 

In some cases, evidence that the following assumption holds 
may be available: 

Assumption III.2. The nominal likelihood ratio I is monotone 
and the nominal density functions are symmetric, i.e., fi{y) = 

fo{-y)yy 

If, additionally, the robustness parameters are set to be 
equal, e = eo = ei, or in other words x{a,e) = x{a,€o) = 
x{a,ei), it follows that 


lu = ^/k 
yu = -yi 


S{y) = l-S{-y) 





9 i{y) = 9 o{-y) (33) 


where yi = l~^{li) and = l~^{lu). These relationships are 
straightforward and therefore the proofs are omitted. Notice 
that, due to monotonicity of I, the limits of integrals Ii, I 2 
and I 3 should be re-arranged e.g.. 


1i- = {y- l{y) < ph} 

= {y:y< l-^{pl{yi))} = {y : y < r^pli-y^))}. 


The symmetry assumption implies: 

(Im) 


9i{y) 

fo{-y) 

9o{-y) 

fo{-y) 


( 9o{-y) 

V fiiy) 


fi{y)<iy 


fo{-y)dy 

foi-y)dy 

(34) 


for all a and e and, it also implies l{y) 
result l{y) = l/l{—y) for all y. Hence, 
is a solution and all the simplihcations 
reduces the four equations given by 


C4=il{yu)f fi{y)<iy 

\ —00 


= l/l{—y) and as a 

9i{y) = 9o{-y)^y 

in ( |3^ follow. This 
to two: 


1 + 

fiiy)dy] 


fiiy)dy 


(35) 




and 


C4“ l{yuY 


rvi 


fi{y)dy 


f i + liyup-^ 

1 + iKy)/p)°‘~^ 


k 

j fi{y)dy\ = x{a,e), 


fi{y)dy 


( 36 ) 
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where yUvu) = I ^{pl{-yu)) and yt{yu) = I ^{pl{yu))- 
These two equations can then be combined into a single 
equation 


Kyn 




f- 

vT 


1 + Kyu) 


i + (^(2/)/p) 



1 + (^(2/)/p)“"^ 

vi 


fi{y)dy-x{a,e){l{yu) fi{y)dy 

J —OO 

r 

fiiy)dy+ / 


/i(j/)dy 


Given p and a, ( [38l l, ( [39| ), and ( |40l l can jointly be solved 
to determine the space of maximum robustness parameters. 
As an example, consider = K, p = 1 and a = 1/2. This 
choice of a corresponds to the squared Hellinger distance with 
an additional scaling factor of l/a(l — a) = 4. Let a = 
\/fo{y)fi{y)dy. Then, the Equations ([38l)-(|40li reduce 
to the polynomials in the Lagrangian multipliers Aq and Ai, 


/i(y)dyj =0, 

(37) 


Aq + A^ + 2AoAia — — — 0, 
4 — 8Aq — SA^a — cq = 0, 

4 — 8Ai — SAqu — Cl = 0, 


(41) 

(42) 

(43) 


from where the parameter can easily be determined. Obvi¬ 
ously, the computational complexity is reduced considerably 
with the aforementioned assumptions, i.e., when ( |J7] ) is com¬ 
pared to and ( |20l i. Note that when p = 1, we have 
y* = —yu and p* = and if additionally a —>■ 1, ( |37l ) 
reduces to la, cf. ®. 

C. Limiting Robustness Parameters 

The existence of a minimax robust test strictly depends on 
the pre-condition that the uncertainty sets Qi are distinct. To 
satisfy this condition, Huber suggested to be chosen small, 
see a p.3]. Dabak Q does not mention how to choose the 
parameters, whereas Levy gives an implicit bound as the rela¬ 
tive entropy between the half way density fi /2 = fo^'^fi^'^Iz 
and the nominal density /o, i.e., e < D{fi/ 2 , fo), where z is a 
normalizing constant. In the sequel, we show explicitly which 
pairs of parameters (eo,ei) are valid to design a minimax 
robust test for the a—divergence distance. 

The limiting condition for the uncertainty sets to be disjoint 
is Go = Gi p-a.e. It is clear from the saddle value condition 
( |30l l that for any possible choice of (eo,ei), which results in 
Go = Gi, it is true that Pe <1/2 for all {go x gi) G (/o x Gi. 
Since infinitesimally smaller parameters guarantee the strict 
inequality Pe < 1 / 2 , it is sufficient to determine all pos¬ 
sible pairs which result in Go = Gi- A careful inspection 
suggests that the LEDs are identical whenever li inf I 
and lu —> supL Eor this choice Ii and I 3 are empty sets 
and the density functions under each hypothesis are defined 
only on 12- Without loss of generality, assume that a < 1, 
inf I = 0 and sup I = 00. Eor this choice li ^ 0 implies 
Pi = Ai/(a —1)-|-1 and lu ^ 00 implies po = Ao/(a—1) + 1. 
Inserting these into one of the first two equations in ( [32l l, gives 

[ (Ao/o(y)^"“ + Aip“"Vi(y)^"“)^di/ = {1-a)^ . 

Jn 

(38) 

Similarly, from the third and fourth equations it follows that 

(^Ao/o(y)^ + Aip“"Vi(2/)^"“/o( 2/)^^^^ dp 

= {1 - a)^ x{a, eo) (39) 

and 

(^Ai/i(y)^ + AoP^"“/o(y)^"“/i( 2 /)^'^^ dp 

= {1 - a)^ x{a,ei). (40) 


respectively. Solving ( |42l i and ( [43] ) for Ao and Ai, respectively, 
and inserting the results into Equation ( |4T] ) we get 

2ei (o(eo - A) + A) - {Aa + eo - df - el = 0. (44) 


Equation ( |44l l is quadratic in a and has two roots. One of the 
roots results in a = 1 for all cq = ei, which is not plausible. 
Therefore, the correct root is. 


a = ^ (^16 - 4ei -f eo(ei - 4) - ^'(eo - 8)eo(ei - 8)ei^ . 

(45) 

Notice that is symmetric in eg and ei, i.e., a(eo,ei) = 
a(ei,eo) for all (eo,ei), as expected. Since 0 < a < 1 is 
known a priori, given a choice of e^, the corresponding ei_j 
can be determined from ( |45] l easily, c.f.. Section IV A special 
case occurs whenever e = eg = ei, which simplifies ( |45] l to 

Cmax = 4 — 2-\/2(l -f a). (46) 


Maximum robustness parameters given by ( |45] l and ( |46l l are 
in agreement with the ones found in ll20l . The case a > 1, 
which implies po = Ao/(q; — 1) and pi = Xi/{a — 1), can be 
examined similarly. 


IV. Simulations 

In this section, some simulations are performed to illustrate 
the theoretical derivations. Consider a simple hypothesis test¬ 
ing problem 

m-.Y = w 

%{■.¥ = W + A (47) 

where A > 0 is a known DC signal, IV is a random variable 
which follows a symmetric Gaussian mixture distribution 

l(Af(-p,a2)+Af(p,a2)), (48) 

where JV{p,a^) is a Gaussian distribution with mean p and 
variance and V is a random variable on H = M, which is 
consistent with the data sample p. To account for uncertainties 
on Y under both hypotheses, let 

Vo(p) :=P(F < p|7fg) Vp 

Fi(p) :=P{Y < y\ni) Vp (49) 

be the nominal distributions, having the density functions fo 
and fi for the binary composite hypothesis testing problem 
given by Q and Note that the symmetry condition. 










fi{y) = fo{-y) for all y, does not hold, and I = fi/fo is 
not monotone. Assume y = 2, a — 1 and A = 1 and let the 
robustness parameters be eg = 0.02 and ei = 0.03 for the 
(a = 4)—divergence distance. This example demonstrates an 
extreme case, for which no straightforward simplification to 
the equations ( [T9l l and ( |20l i exists, both in terms of reducing 
the number of equations as well as for the domain of integrals. 
Figure illustrates the nominal density functions fg and 
/i along with the density functions of the corresponding 
least favorable densities (LFD)s yg and pi, for an equal a 
priori probability p = 1. It can be observed that LFDs 
intersect in three distinct intervals, each at the neighborhood 
of y = —1.5 + j for j € {0,2,4}. In Fig. the same 
simulation is repeated for p = 1.2. In Fig. the nominal 
and least favorable likelihood ratios for the same example are 
shown. As it was given by robustification of the simple 
hypothesis test corresponds to a non-linear transformation of 
the nominal likelihood ratios. 

In the next simulation, all the parameters are fixed as before, 
except for a. We are especially interested in the change in 
the lower and upper thresholds, k and 1^, for varying a. 
Figure illustrates the outcome of this simulation for p = 1. 
We can see that li and tend to 1 for a —oo. It is not 
straightforward to derive this from and ( |20l i for any fg 
and /i. However, if there exists a solution, which is true and 
unique by the KKT multipliers approach, it should satisfy 
D{f,g-,a) = Ei for any a > 0 and for all allowable e^, cf. 
Section |III-C| Assume that g is fixed and it does not depend 
on a. Then, the integral jgy“/^““dp is I at a = 0 and 
a = 1, convex in a, and it is positive for all a > 0, f and g. 
Hence, lima^oo /] 5 y“/^““dp = oo and D{f,g;a) 

is indeterminate. Using L’Hospital’s rule twice we obtain 

K = lim D{g, /; a) = lim (g//)(g//) 

CK—>-cxD a—>-oo 2 

The integral log^(y//)(y//)“/dp is also positive and 
convex in a. This implies K ^ oo for a —^ oo. Now, assume 
that g depends on a and tends to a limiting distribution g* for 
11y* — /II > 0, when a —^ oo. Then, our conclusion does not 
change, i.e., AT —>■ oo for a —)■ oo. Since D{f,g;a) is finite, 
we require that a —> oo => g* —>■ f- Consequently, from ( fTS] ) 
and ([T6}, 5, —> fi whenever I and I explains the 

asymptotic of Figure for any pair (/o, /i)- 
Based on simulation results the following are conjectured: 

• For a fixed eg and ei, increasing a leads to a monotone 
decrease in and monotone increase in li on IR+\{0,1}. 

• For a fixed a, increasing eg, ei or both introduces a non¬ 
decrease in lu, non-increase in li, or both, given that 
eo and ei are less than their allowable maximum, cf. 
Section IIII-CI 

The proof of these conjectures is an open problem. 

From and ( |20| ), it is clear that given a pair (eo,ei), 
a slight change in a changes the equations completely and 
in general li and are functions of a. In Figure |7] the 
robust decision rule S for various a values is plotted, without 
considering the dependency of and on a. To do this, 
k « 0.605 and « 1.618, that are found for p = I, a = 4, 
eo = 0.02 and eo = 0.03, are fixed constants in Then, 


for a = {O.OI, 10, lOOj, ( [T7| l is plotted. The decision rule 6 
tends to a step like function for an increasing a, whereas for a 
smaller a, i.e., a = 0.01, the decision rule is almost linear at 
the domain of the likelihood ratio for which 1 = 1. This result 
is also in agreement with the previous findings; 6 tends to a 
non-randomized likelihood ratio test for a —> oo, for which 
we obtained gi —>■ fi and for (/o,/i) optimum decision rule 
is known to be a non-randomized likelihood ratio test. 

In the following simulation, the simplified model {fg{y) = 
fi{—y)) is tested for mean shifted Gaussian distributions; 
Fq ~ N{pg,(j'^) and Fi ~ A/’(pi,cr^) with means p.g = —I, 
Pi = I and variance = I. The parameters of the composite 
test are chosen to be p = I, eg = O.I and ei = O.I. Here, 
our main interest is to observe the change in overlapping 
regions of least favorable density pairs for various a. Figure 
illustrates the outcome of this simulation. It can be seen that 
the overlapping region is convex for a negative a, (a = —10) 
almost constant for a = 0.01 and concave for a positive a, 
(a = 10). For the sake of clarity only three examples of a are 
plotted. 

In Figure the false alarm and miss detection probabilities 
of the likelihood ratio test 6 for (/o,/i) are graphed and 
compared with the robust test 6 for {gg,gi). Two different 
robust parameter pairs and various signal to noise ratios 
(SNR)s, i.e., SNR = 201og(A/tT) are considered. It can be 
seen that increasing the robustness parameters increases the 
false alarm and miss detection probabilities for all SNRs, 
as expected. The difference between false alarm and miss 
detection probabilities for the same robust test is small and 
it is more pronounced for low SNRs. For high SNRs the 
performance of two robust tests are close to each other. The 
reason is that for high SNRs maximum allowable robustness 
parameters become relatively high compared to the parameters 
of both robust settings. Although the nominal test has the 
lowest error rates, its performance can degrade considerably 
under uncertainties in the nominal model. The robust tests, on 
the other hand, have slightly higher error rates, but guaranteed 
power of the test, which indicates the trade-off between 
performance and robustness. Finally, in the last simulation, the 
3D boundary surface of the maximum robustness parameters 
is determined for a = 0.5 ( |45| ) and is shown in Figure 
This surface has a cropped rotated cone like shape, which is 
symmetric about its main diagonal, i.e., with respect to the 
plane eo = ei on the space (eo,ei,a). Notice that except 
for the points on the cone like shape that intersect with the 
(eo, ei, a = 0) plane, all other points on (eo, ei, a = 0) that are 
plotted in blue color are un-defined (rather than being valid 
points with a = 0), implying that for those points no minimax 
robust test exists. 

V. Conclusion 

A robust version of the likelihood ratio test considering 
a—divergence as the distance to characterize the uncertainty 
sets has been proposed. The existence of a saddle value to the 
minimax optimization problem was shown by adopting Sion’s 
minimax theorem. The least favorable distributions, the robust 
decision rule as well as the robust version of the likelihood 
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[fi, gi) 



Fig. 3. Nominal densities and the con'esponding least favorable densities for 
p = 1, Q = 4, eg = 0.02 and ei = 0.03. 


[fi, 



Fig. 4. Nominal densities and the corresponding least favorable densities for 
p = 1.2, a = 4, eo = 0.02 and ei = 0.03. 

ratio test were derived in two parameters and in three distinct 
regions on the co-domain of the nominal likelihood ratio. 
Two equations from where the parameters can be determined 
were also derived. It was found that the robust likelihood 
ratio doesn’t depend on the parameter a that characterizes the 
distance between the probability measures. When the nominal 
density functions satisfy a symmetry constraint, the two non¬ 
linear equations were combined into a single equation. Finally, 
the upper bounds on the parameters that control the degree 
of robustness were derived. Open problems include proving 
the monotonicity of the parameters li and for increasing 
(eo,ei), or a. It was shown that simulation results illustrate 
the theoretical results. 


Appendix A 
Simplification of <i)i 

From ( |3 T| i consider the following steps for 

^ _ f + Aq -f Ai -f po + Ml ~ Q!(—1 + ^0 + /ri) 

V Ai-I-A o(Vp)““^ 

• Dividing the numerator and the denominator by Aq and 
replacing the term 1 + mo/Aq — a^o/Xo by results 



(/,/) 



Fig. 5. Nominal and least favorable likelihood ratios {gi/go for p = 1 and 
§1 /go for P = 1*2) for a = 4, eo = 0.02 and ei = 0.03. 

Ul, lu) 



in 

-I- (Ai - 1 -I- Pi -I- a - api)/Ao 
(Ai/Aq) + 

• Multiplying the numerator and the denominator of the 
result of the previous step by Aq/Ai, replacing the term 
1 — 1/Ai -f pi/Ai -f a/Ai — api/Ai by and again 
multiplying both the numerator and the denominator by 
Ai gives 

/AoCi“-i+AiC3“-iA^ 

' Ai+Ao(Z/p)“-i J ■ 

• The result of the previous step is free of parameters po 
and pi, but still parameterized by Aq and Ai. To eliminate 
them, using the identities Aq = (1 — a)/(c““^ — 

and Ai = (1 — a)/{c2~^ — leads to 

(C1C4)“~^ + (C2C3)"~^ 

Cl“"l - C2“-l -f (C4“-^ - C3“-1 )(Vp)“-1 
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\fh gi\ 



Fig. 8. Nominal densities and the con'esponding least favorable densities for 
p = 1, eo = 0.1 and ei = 0.1. 


Fig. 10. All allowable pairs of maximum robustness parameters, (eo>^i). 
w.r.t. all distances a £ [0,1] for a = 0.5. 

and C 4 = k{li,lu)c 3 yields 
^i{l,h,lu,C3-a,p) = 

(51) 

Appendix B 
Simplification of S 

Since the equivalence of integration domains are given by 
( fl4| ), only 

_ ■^o(~l + g + Aj + /ij — api) Ai(Ao + /xq — a/io)(Z/p)^ 

(—1 + q;)(Ao + Ai(Z/p)^““) (—1 + Q!)(Ao + Xi{l/p)^~°‘) 


• The result from the previous step depends only on ci, C 2 , 
C3, C4 and a. Using the substitutions ci = C3I1, ci = 04?^ 


{Pf, Pm\ 



Fig. 9. False alarm and miss detection probabilities of (p = 1) for 

(/Oi/i) compared to that of the robust decision rule <5 for {go,9l) when 
SNR is varied. 


is required to be simplified. In the following, the simplihcation 
is performed in three steps and the domain term I = p is 
omitted for the sake of simplicity: 

• Dividing the numerator and the denominator of the hrst 
term by Ai and the second term by Aq, and replacing the 
related terms by and results in 

s = ^0 Ai ^ c^-^(l/py-°‘ 

-1 + a ■ ^ + (//p)i-a -l + Q,’ i + 

cr^-crHi/pV-‘^ 

(-i + “)(xr + i('/p)‘-“)' 

• The result of the previous step is free of parameters po 
and pi, but still parameterized by Aq and Ai. To eliminate 
them, using the identities Aq = (1 — a)/(c““^ — 

and Ai = (1 — a)/{c%~^ — Cg”^) leads to 

(pp)i-^cr^-cr^ 
c4“-'-cr' + (cr^-cr')(Pp)i-“' 

• The result from the previous step depends only on Ci, C 2 , 
C 3 , C 4 and a. Using the substitutions ci = 03 X 1 , ci = c^lu 
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and C 4 = k{liju)c 3 yields 

_ 

- {k{h,L)iu)^-^){i/py-'^ + KhXY-^ -1 

as wanted. 
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